United States Patent [19] 

Neufeld 



US005974438A 
[li] Patent Number: 
[45] Date of Patent: 



^5,974,438 
*0ct. 26, 199ST ^ 



[54] SCOREBOARD FOR CACHED MULTI- 
THREAD PROCESSES 

[75] Inventor: E. David Neufeld, Tomball, Tex. 

[73] Assignee: Compaq Computer Corporation, 
Houston, Tex. 

[21] Appl. No.: 08/775,144 
[22] Filed: Dec. 31, 1996 

[51] Int. CI. 6 G06F 9/00 

[52] U.S. CI 709/104; 711/144 

[58] Field of Search 395/650,200, 

395/457, 677, 800.01, 670, 673, 674; 711/144, 
133, 135; 709/103, 104, 107, 100; 712/1 

[56] References Cited 

U.S. PATENT DOCUMENTS 

5,261,053 11/1993 Valencia 395/200 

5,287,508 2/1994 Hejna, Jr. et al 395/650 

5,317,738 5/1994 Cochcroft, Jr. et al 395/650 

5,379,428 1/1995 Belo 395/650 

5,437,032 7/1995 Wolf et al 395/650 

5,630,130 5/1997 Perotto et al 395/677 

5,701,432 12/1997 Wong et al 395/457 

5,717,926 2/1998 Browning et al 395/674 

5,745,778 4/1998 Alfieri 395/800.01 

Primary Examiner — Alvin E. Oberley 
Assistant Examiner — Yveste G. Cherubin 



Attorney, Agent, or Firm — Paul Katz Frohwitter 
[57] ABSTRACT 

A computer system c om prising aL least one processor and 
a ssociated cache memory, and a pluralit y of registers to keep 
tr ack of the number of cache memory Ji nes associated w ith 
e ach process thread running in the computer syste m. Each 
process thread is assigned to one of the plurality of registers 
of each level of cache that is being monitored. T he numb er 
of cache memory lines associated with each prqcess thread 



i n a particular level of the cache is stored as a number val ue 
i n the ass^ned register and will increment as more cac he 
memory lin es are u sed for the process thread and w ill 
d ecrement as less cacne memory Lines are use d. The number 
value in the register is defined as the "process thread 
temperature." Larger number values indicate warmer pro- 
cess thread temperature and smaller number values indicate 
cooler process thread temperature. Process thread tempera- 
tures are relative and indicate the cache memory line usage 
by the process threads running in the computer system at a 
particular level of cache. By keeping track or "score" of the 
number values (temperatures) in each of these registers 
called "scoreboard registers," the scheduler algorithm of the 
computer operating system may objectively determine the 
most advantageous order for the process threads to run and 
which of the processors in a multi-processor system should 
execute these process threads, A scoreboard register may be 
reassigned to a new process thread when its associated 
process thread has been discontinued. 

14 Claims, 6 Drawing Sheets 
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SCOREBOARD FOR CACHED MULTI- 
THREAD PROCESSES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to cache memory utilization 
in a single or multi-processor computer system, and more 
particularly in determining the order of execution of mul- 
tiple program process threads and the association of com- 
puter processors therewith. 

2. Description of the Related Technology 

Use of computers in business and at home is becoming 
more and more pervasive because the computer has become 
an integral tool of most information workers who work in 
the fields of accounting, law, engineering, insurance, 
services, sales and the like. Rapid technological improve- 
ments in the field of computers have opened up many new 
applications heretofore unavailable or too expensive for the 
use of older technology computers. A significant part of the 
ever increasing popularity of the computer, besides its low 
cost relative to just a few years ago, is its ability to run 
multiple application programs which appear to the user to be 
running concurrently. In a multi-processor computer system 
some of the application programs may run concurrently. 

These application programs may be word processing, 
spreadsheet, database, graphics, computer aided design and 
engineering, and telecommunications to name a few. In the 
computer system there is a software operating system 
("OS") that controls the functions of the computer processor 
(s) and peripheral components which make up the computer 
system. The OS program includes routines or an algorithm 
called a "scheduler" that decides which application program 
(s) is running on a processor(s), and what application 
program(s) will be running next. 

The scheduler algorithm defines each part of the applica- 
tion program that is being executed on a processor as a 
"process" or "process thread." When a process thread is 
interrupted for what ever reason, the register values and 
unexecuted instructions of that process thread are saved and 
the scheduler tells the processor to restore the register values 
from a different process and start execution of another 
process thread. This is called a "Context Switch." In a 
multiprocessor computer system, multiple process threads 
may be executed on the multiple processors at the same 
time. However, only one process thread can run on one 
processor at any given time. Multiple processors can run 
multiple process threads up to the number of processors 
running concurrently in the computer system. The applica- 
tion programs can appear to the user to be running concur- 
rently because the processors) switch(es) between the dif- 
ferent threads very quickly, thus giving the impression that 
the application programs are running simultaneously. 

The processor or plurality of processors in a computer 
system run in conjunction with a high capacity, low-speed 
(relative to the processor speed) main memory, and a low 
capacity, high-speed (comparable to the main memory 
speed) cache memory or memories (one or more cache 
memories associated with each of the plurality of 
processors). 

Cache memory is used to reduce memory access time in 
mainframe computers, minicomputers, and microproces- 
sors. The cache memory provides a relatively high speed 
memory interposed between the slower main memory and 
the processor to improve effective memory access rates, thus 
improving the overall performance and processing speed of 
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the computer system by decreasing the apparent amount of 
time required to fetch information from main memory. 

In today's single and multi-processor computer systems, 
there is typically at least one level of cache memory for each 

5 of the processors. The latest microprocessor integrated cir- 
cuits may have a first level cache memory located in the 
integrated circuit package and closely coupled with the 
central processing unit ("CPU") of the microprocessor. 
Additional levels of cache may also be implemented by 

10 adding fast static random access memory (SRAM) inte- 
grated circuits and a cache controller. Typical secondary 
cache size may be any where from 64 kilobytes to 8 
megabytes and the cache SRAM has an access time com- 
parable with the processor clock speed. 

15 In common usage, the term "cache" refers to a hiding 
place. The name "cache memory" is an appropriate term for 
this high speed memory that is interposed between the 
processor and main memory because cache memory is 
hidden from the user or programmer, and thus appears to be 

20 transparent. Cache memory, serving as a fast storage buffer 
between the processor and main memory, is not user addres- 
sable. The user is only aware of the apparently higher-speed 
memory accesses because the cache memory is satisfying 
many of the is requests instead of the slower main memory. 

Cache memory is smaller than main memory because 
cache memory employs relatively expensive high speed 
memory devices, such as static random access memory 
("SRAM"). Therefore, cache memory typically will not be 

30 large enough to hold all of the information needed during 
program execution. As a process executes, information in 
the cache memory must be replaced, or "overwritten" with 
new information from main memory that is necessary for 
executing the process thread. The information in main 

3S memory is typically updated each time a "dirty" cache line 
is evicted from the cache memory (a process called "write 
back"). As a result, changes made to information in cache 
memory will not be lost when new information enters cache 
memory and overwrites information which may have been 

40 changed by the processor. 

Information is only temporarily stored in cache memory 
during execution of the process thread. When process thread 
data is referenced by a processor, the cache controller will 
determine if the required data is currently stored in the cache 

45 memory. If the required information is found in cache 
memory, this is referred to as a "cache hit." A cache hit 
allows the required information to be quickly retrieved from 
or modified in the high speed cache memory without having 
to access the much slower main memory, thus resulting in a 

50 significant savings in program execution time. When the 
required information is not found in the cache memory, this 
is referred to as a "cache miss." A cache miss indicates that 
the desired information must be retrieved from the relatively 
slow main memory and then placed into the cache memory. 

55 Cache memory updating and replacement schemes attempt 
to maximize the number of cache hits, and to minimize the 
number of cache misses. 

Information from main memory for a process thread is 
typically stored in "lines" of cache memory which contain a 

60 plurality of bytes or words from the main memory such as, 
for example, 16, 32 or 64 bytes of information. The plurality 
of bytes from main memory are stored sequentially in a line 
of cache memory. The cache memory comprises a plurality 
of lines of information that may store information for a 

65 plurality of process threads. Each line of cache memory has 
an associated "tag" that stores the physical addresses of 
main memory containing the information in the cache line as 
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well as other things such as "MESI" state information for the minimize delays incurred when a cache miss is encountered, 

cache line. From the example above, if 16 bytes of infor- as well as improve cache memory hit rates, an appropriate 

mation are stored in a cache line, the least significant 4 bits cache memory replacement scheme is used, 

of the physical address of main memory are dropped from Set associative cache memory replacement schemes may 

the main memory address stored in the tag register. In 5 be divided into two basic categories: non-usage based and 

addition, the tag register may contain a cache consistency usage based. Non-usage based replacement schemes, which 

protocol such as "MESI" (Modified, Exclusive, Shared and include first in, first out ("FIFO") and "random" replacement 

Invalid) to ensure data consistency in a multi-processor or schemes, make replacement selections on some basis other 

bus master environment. than memory usage. The FIFO replacement scheme replaces 

A c ache memory is said to be "direct m a pped" if each b v te 1Q the line of a given set of cache memory which has been 

of information can only be written to one place in th e cache contained in the given set for the longest period of time. The 

memory. The cache memory is said to be "fully associative" random replacement scheme randomly replaces a line of a 

if a byte of information can be placed anywhere in the cache given set. 

memory. T he cache memory is said to be "set associativ e" Usage based schemes, which include the least recently 

if a flroup of blocks of i nformation from main memory can lS use d ("LRU") replacement scheme, take into account the 

only be placed in a restricted set of places in the cache history of memory usage. In the LRU replacement scheme 

memory^namel y^in a specified "set" of the cache memor y. the least recently used line of information in cache memory 

Computer systems ordinarily utilize a variation of set asso- is overwritten by the newest entry into cache memory. An 

ciative mapping to keep track of the bytes of information LRU replacement scheme assumes that the least recently 

that have been copied from main memory into cache 20 US ed line of a given set is the line that is least likely to be 

memory. reused again in the immediate future. An LRU replacement 

The hierarchy of a set associa tive, rarhe, memo ry scheme thus replaces the least recently used line of a given 

regeTftKfS a matrix . Th at is, a set associative, f^che m emory se t with a new line of information that must be copied from 

is divide d into different "set s^(suc h as t h e . r ows.oiLajroatrix) main memory. 

and jjjflfe.re.nt "wavs" Tsurh as the, columns of a matrix l 25 Regardless of the replacement scheme used, the scheduler 

Thus, each line of a set associative cache memory is mapped algorithm will decide what process thread will be executed 

or placed within a given set (row) and within a given way nex t in a single processor computer. In a multi-processor 

(column). The is number of columns, i.e., the number of computer system, the scheduler algorithm decides what 

lines in each set, determine the number of "ways'* of the process threads are to run concurrently, and which processor 

cache memory. Thus, a cache memory with four columns 30 w in execute each of these process threads. The scheduler 

(four lines within each set) is deemed to be "4-way set then determines the next appropriate process thread to be 

associative." executed, etc. The scheduler may cause the process threads 

Set associative cache memories include addresses for to be executed in order of occurrence, or the order of 

each line in the cache memory. Addresses may be divided execution may be determined by some software or hardware 

into three different fields. First, a "block-offset field" is 35 priority paradigm. The scheduler cannot determine, 

utilized to select the desired information from a line. however, what the likely cache hit or miss outcome will be 

Second, an "index field" specifies the set of cache memory during execution of any given process thread. Some oper- 

where a line is mapped. Third, a "tag field" is used for ating systems use a concept called "Strong Affinity" when 

purposes of comparison. When a request originates from a scheduling threads. "Strong Affinity" schedulers attempt to 

processor for new information, the index field selects a set 40 execute a thread 00 the same processor it last ran on. The 

of cache memory. The tag field of every line in the selected reason for doing this is because the same processor's cache 

set is compared to the tag field sought by the processor. If the is more likely to contain data that is relevant to the process 

tag field of some line matches the tag field sought by the than some other processor in the system, 

processor, a "cache hit" is detected and information from the What is needed is a method and apparatus to improve the 

block is obtained directly from or modified in the high speed 45 likelihood of cache hits during execution of a process thread, 

cache memory. If no match occurs, a "cache miss" occurs it is desired to improve the computer system efficiency by 

and the cache memory is typically updated. Cache memory having the scheduler algorithm make an informed decision 

is updated by retrieving the desired information from main on which program thread would be most appropriate to run 

memory and then mapping this information into a line of the next and on what processor. In addition, it is desired to 

set associative cache. When the "cache miss" occurs, a line 50 improve usage of cache memory by selecting locations to be 

is first mapped with respect to a set (row), and then mapped written to that contain no longer needed process thread 

with respect to a way (column). That is, the index field of a information. 

line of information retrieved from main memory specifies ORTFrr^ of thp INVENTION 

the set of. cache memory wherein this line will be mapped. utuz^i $ vr int. ir* vjiin 1 iujn 

A "replacement scheme" is then relied upon to choose the 55 It is therefore an object of the present invention to 

particular line of the set that will be replaced. In other words, improve the efficiency of a computer system by keeping 

a replacement scheme determines the way (column) where track of the number of cache memory lines containing 

the line will be located. The object of a replacement scheme information for each of the active process threads, 

is to select for replacement the line of the set that is least It is a further object of the present invention to assign a 

likely to be needed in the near future so as to minimize 60 counter to each process thread running in the computer 

further cache misses. system and increment or decrement this counter each time a 

Several factors contribute to the optimal utilization of cache memory line is used to store information for the 

cache memory in computer systems: cache memory hit ratio process thread or forced out from the cache associated with 

(probability of finding a requested item in cache), cache the counter, respectively. 

memory access time, delay incurred due to a cache memory 65 It is a further object to provide the scheduler algorithm 

miss, and time required to synchronize main memory with with the number of cache memory lines storing information 

cache memory (write back or write through). In order to for each process thread. 
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It is a further object to select the next process thread to run thread is assigned by the scheduler algorithm to be executed 

based on the number of cache memory lines having infer- on a processor, cache memory lines are filled with data and 

mation for that process thread. instructions ("information") for that first process thread. 

It is a further object to select a certain processor in a Each time a memor y line of me cache memor y is used t0 

multiprocessor computer system to run a certain process 5 store information for the first process thread, the present 

thread based on the number of cache memory lines having invention will increment a numerical value stored in an 

information for that certain processor and that certain pro- associated first register. When a second process thread is 

cess thread. scheduled to be executed, its information is stored in the 

» . r> j I . . , . iL , A . 4 c , . , cache memory and a numerical value stored in a second 

It is a further object to use the relative amount of retained . . . t j iL , iL , . . 
, . , J ^w/l* 10 register associated with the second process thread is mere- 
data in cache memory or cache temperature (cache tem- « j u * u i- a 
c ii j j i ■ c< \ c • mented according to the number of cache memory lines used 
perature more fully described hereinafter) of a process in - iL , ° , r ' , 
r . ■ c * ■ * «i_ l j i for me second process thread. This goes on for each process 
coniunction with other information to assist the scheduler , ... 4 ^ . . 
, J , . . r j j • ■ l- L thread runmng in the computer system. Thus, a register 
algorithm in making a more informed decision which pro- . 4 , \_ , J . . . r 

associated with a process thread contains the number of 

cess to dispatch next on a processor. . , . , 4 . r r tL , 

r r 15 cache memory lines used to store information for that 

It is a further object of the present invention to have a pr0C ess thread. When a cache line belonging to a particular 

plurality of counters, each of the plurality of counters process thread is over-written the associated counter for that 

associated with a process thread running in the computer pr0C ess thread is similarly decremented. The number of 

system. When a cache memory line is written to for a certaio cacne mem ory lines used to store information for a running 

process thread, increment the associated counter. When a 2Q first process thread may be different from the number of 

cache memory line containing information for that certain cache mem ory lines storing information for that first process 

process thread is over-written by data for another process thread when it ^ not running. This is so because other 

thread, decrement the associated counter. running process threads, i.e., second, third, fourth, etc., may 

It is a further object to compare the number of cache require some of the same cache locations used by the first 

memory lines used while a process thread was running with 2 s process thread, thus reducing the number of cache lines 

the number of cache memory lines remaining that still have containing information for the first process thread. When the 

information for that process thread and to select the next first process thread runs again, the number of cache memory 

process thread to be run based on the thread having a high lines containing information for the first process thread may 

percentage of remaining cache memory lines containing or may not be substantially reduced depending on the cache 

information for the process thread. 30 memory activity of the previously running other process 

It is a further object that cache temperatures of different threads. An absolute number of cache memory lines con- 
process threads can be used to assist the scheduler algorithm taining information for a process thread may not necessary 
make a better decision about the order that the different indicate the comprehensiveness and completeness of that 
process threads should be executed in order to minimize information. One process may only use a small amount of 
"cache thrashing/' 35 information when running while another process may need 

It is a further object that cache temperatures of different a substantially greater amount of information when running, 

process threads can be used to detect processes that heavily ^ absolute num ]> er of ^ chG memo jy lin< f are not 

use the same locations in the cache memory and more necessarily indicative of the completeness of the information 

efficient operation of the computer system may be benefited stored for each P rocess thread - According to the present 

by isolating these processes to different processors. ™ invention, a relative number or percentage of cache lines 

T . - 4 , , . , t , , 4 containing information for a process thread may be utilized 

It is a further object to normalize cache temperature on a . , 4 . . ... r t , , „ . ' 1171 _ c 

, . . . , i i - iL i in determining which process thread to next run. When few 

per process basis to assist the scheduler algorithm make a r * . r .. . , r 

f r ... , iL .t , j or none of the cache lines have been over written for a 

better decision about which process to schedule next and to « . , , . t t 

, r • u» l u j i j process thread that had previously run, then the relative 

determine what processor a process might be scheduled r , , J , / . . c . 

r r * 45 number or percentage of cache lines remaining for that 

u P on * process thread would be contained in an associated register. 

It is a further object of the present invention to over- write ^ nigher mis re i at ive number, the greater the amount of 

the cache memory lines containing information of a discon- information for an associated process thread remains in the 

tinued process thread before over-writing the cache memory cache memory 

locations of running process threads. Even if this is the most 5Q [n a multi . processor computer system, the scheduler algo- 
recently used (MRU) line m the set. ^ oflcn which processor ^ t0 run a selected 
It is an object of the present invention that system process thread. Typically, each processor of the multi- 
performance can be increased by varying the duty cycle at processor computer system will have its own cache memory 
which various process threads are executed. Process threads wmc h may comprise a single or multiple level cache 
that cause a lot of cache evictions (cache thrashing) to occur 55 me mory system. Thus in a multi-processor system, counter/ 
can be run more optimally by allowing them to run less often registers will be assigned to each cache memory associated 
but for a longer period of time. with a processor and in a multi-level cache memory, counter/ 

cTTXAiADvnr T-rrrj .xn/cMTi^M registers may be assigned to each level thereof When a 

SUMMARY OF THE INVENTION p ^ ^ ^ man Qne q{ a ^ 

The above and other objects of the present invention are 60 processor computer system, there will be more than one 

satisfied, at least in part, by providing a plurality of counter/ register associated with that process thread (only one 

registers that may be associated with each process thread register, however, is needed per level of cache memory 

under the control of a scheduler algorithm in the operating associated with a processor of the mutti -processor computer 

system software of a computer system. An embodiment of system for each process thread). Any level of cache in a 

the present invention utilizes a plurality of counter/registers 65 system can benefit from the addition of these counters. The 

that are associated with the process threads that are presently more levels of cache that have these counters in a system, the 

active in the computer system. Initially when a first process more information that is available to the scheduler algorithm 
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to make a better and more informed decision about the 
execution order and location for process threads. The 
counter/registers of the present invention are used to indicate 
how many cache memory lines contain information for each 
of the process threads running in the computer system, and 5 
which processor cache memory (single or multi-level) con- 
tains the respective information for each of the process 
threads. 

Hereinafter the aforementioned counter/registers of the 
present invention will be referred to as "scoreboard regis- 10 
ters" and will be used to indicate how "hot" or "cold" the 
information of a particular process thread is relative to the 
information of the other process threads in the cache 
memory. Thus a process thread retaining much of its infor- 
mation in the cache memory will be "hot" as to that cache 15 
memory and associated processor. Conversely, another pro- 
cess thread retaining only a small portion of its information 
in the cache memory will be "cold." The relative "tempera- 
ture" of a process thread relative to the other active process 
threads in the computer system may thus be determined by ^ 
referring to the value in the scoreboard registers) associated 
with that process thread. Larger numbered values will typi- 
cally be "hotter" than smaller numbered values. 

If a second process thread uses any of the cache locations 
previously used by the first process thread, then the first 2 s 
scoreboard register (associated with the first process thread) 
is decremented by the number of cache memory lines 
evicted when loading information for the second process 
thread. In this way, the scoreboard registers keep track of the 
relative temperature (proportional amount of information 30 
stored in cache memory) for each process thread. The 
scoreboard registers may be associated with the tag fields of 
each cache memory, i.e., the value of a scoreboard register 
indicates the number of tag addresses pointing to the cache 
memory lines containing information of a certain process 35 
thread. 

According to the present invention, the scheduler algo- 
rithm will designate a scoreboard register for each process 
thread that has run, is running and will be soon running in 
the computer system. In this way, a "scoreboard map" is 40 
created representing the information distribution in each of 
the cache memories for all process threads that have run or 
are running. A problem could occur, however, if there are 
more process threads then available scoreboard registers. 
The scheduler algorithm could reassign a scoreboard register 45 
from a discontinued process thread to a new process thread 
using a paradigm similar to the LRU cache memory replace- 
ment scheme mentioned above. In the event there are not 
enough counters in the system to address all the processes, 
the counters could be reused in several different ways. A 50 
counter owned by a process that is believed to not use cache 
locations that will be altered by other processes in the near 
term could be used. A process that has a 'cold* cache 
temperature might be able to give up its counter to another 
process. Other algorithms may similarly be applied. 55 

Whenever a processor in the computer system executes a 
process thread assigned to it by the scheduler algorithm, 
information (data and instructions) for the process thread is 
retrieved from cache memory. There is usually at least one 
cache memory associated with the processor of a single 60 
processor computer system, and at least one cache memory 
for each processor of a multi -processor computer system. In 
an embodiment of the present invention for multi-processor 
computer systems, the scheduler algorithm assigns a score- 
board register for each process thread having information 65 
stored in each cache memory of each processor of the 
multi-processor system. For example, a first process thread 
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is executed in processor A and a second process thread is 
executed in processor B, then the first process thread con- 
tinues execution in processor B and the second process 
thread continues execution in processor C. There will be first 
process thread information stored in the cache for processor 
A and the cache for processor B. Likewise, there will be 
second process thread information stored in the cache for 
processor B and the cache for processor C. There will also 
be different numbers at the various levels of the cache 
hierarchy for each process. 

In the preceding example, the scheduler algorithm will 
have used four scoreboard registers for the first and second 
process threads because each process thread was executed in 
two different processors and, thus, information is stored for 
each of the first and second process threads in two different 
caches. As the number of processors and non-shared cache 
levels increase the number of scoreboard registers increase 
as the product thereof. Therefore, it is preferred that separate 
temperatures be kept with separate scoreboard registers for 
the number of lines of information in each cache memory for 
each process thread. The present invention allows the sched- 
uler algorithm to make decisions based on the relative 
temperatures of the cache for various threads on a given 
processor and also gives the scheduler algorithm informa- 
tion that could be used to decide on which processor a thread 
should be run. This feature also enables the scheduler 
algorithm to make a better decision about which process 
thread to execute next, and to have better insight into which 
processor would be the most advantageous to use in execut- 
ing the process thread. Also, many programs consist of 
multiple threads to accomplish their work. The scheduler 
could keep tract of which execution order of the threads 
yield the least contention. The operating system could keep 
this information so that next time the application is used, it 
has a head start and doesn't need to "learn" this order all 
over again. 

A feature of the present invention is that a finished or 
discontinued process thread may be flagged with the score- 
board register and the associated cache memory lines of the 
discontinued process thread may be used immediately in 
storing information for new process threads. A preferred 
application of the present invention is to allow the scheduler 
algorithm to indicate to the cache temperature counters that 
a particular process is being terminated and that cache lines 
owned by it can be marked as existing in the 'Invalid* state 
of the MESI protocol. This would allow the cache controller 
to automatically reuse cache locations that are not being 
used by any existing process. This could be accomplished by 
the cache controller by telling it when to cycle through all 
the cache lines looking for locations owned by dead pro- 
cesses. This way be accomplished as a background opera- 
tion. 

Still another feature of the present invention allows the 
scheduler algorithm to re-assign a scoreboard register even 
though its former process thread information count is not 
zero. This enables the scheduler to utilize a scoreboard 
register of a cold process thread for a pending process thread 
when there are no scoreboard registers otherwise available. 
It would be preferable to have enough scoreboard registers 
available that none need to be shared between different 
processes. 

An advantage of the present invention is that the sched- 
uler algorithm has extra information available to make a 
more informed decision about which task to schedule next 
and on which processor. This extra information allows the 
system to make better choices that can be measured as 
increased throughput and responsiveness of the system. 
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Another advantage is when a process thread has run on total CPU time, but this type of scheduling would not cause 

multiple processors in the past, information is available as many cache misses. 

describing the relative cache temperatures on these proces- The present invention utilizes a plurality of registers to 

sors. The scheduler algorithm can determine whether to wait keep track of the number of cache memory lines used to 

for a busy processor that is associated with the cache having 5 store information for each of the process threads running in 

the hottest process thread information or allow an idle the computer system. When a first prnress thread is running. 

processor associated with a cache having a cooler tempera- i t requires that a certain amount of information h r -st n re d in 

ture to execute the process thread immediately. This decision the cache memory associated with the_processot running this 

may also be based on thread latency, but the present inven- first thread . After the processor has executed the first thread 

tion allows the scheduler to make a better informed decision 10 for a period of time, the processor will execute subsequent 

as to what processor to use and in what order the process process threads according to the scheduler algorithm. The 

threads should be executed. subsequent process threads also require that information for 

~, , ■ *• * w _ * them be placed in the cache memory of the processor 

The present invention, in a multi -processor environment, " *"* vvu , . • r 

may us^ any or all of the aforementioned embodiments and f xecu ' m g threads - S "f_ *° Cach ° T^?" u 

features, and, in addition, may also enable the scheduler M lim ited storage capacity, soulK uf- the uUie Uu^-rrm st bcTfi^y*. 

algorithm to quantitatively determine whether to wait for a reused to store new information when runrunR the subse- J 

processor-cache combination that is hot or proceed to qigm process thr eads. ITtus, information for running sub- 

another processor-cache combination that is cooler but S6 ^ 6nt P™?" "f ^T* ° V ° mfonnat, ° 1 n 

available The present invention may accomplish this by ^ f? red f ° r a lh ' ead lhat was f ~* P"™** ">e 

i . r u j on present invention, the number of cache memory lines used 

comparing the relative temperatures of each process thread 20 f . 1 J . 

between each of the processors in the multi-processor com- 10 ? ° re ""fimnalion tor a particular process thread, typ.cally, 
puter system for each of the active process threads. A &™ l f when that process thread * running. The 

F , . # . , . , . . , Aa next time that process thread runs there may be fewer lines 

probabilistic or heuristic approach may be used to decide ^ . f J , 

„,u^k„ , „,!L, Mr ,,. #rt iL„^ m „ ^ r of cache memory that store its information because other 

whether to wait on a not processor to become available or _ , _/ , - . 4 _ , 

, *u * • -i ui n< process threads have subsequently evicted some or the cache 

use a cooler processor that is available. 2 , j u *u « / ^ ^ p 

, , , , .„ , locations used by the first process and used some of the 

Other and further objects, features and advantages will be cache memory for themselves, 
apparent from the following description of presently pre- Th e present invention determines the number of c ache 

ferred embodiments of the invention, given for the purpose Uries J ^ Mormation for cach process thread in rc£i stcrs 

of disclosure and taken in conjunction with the accompa- ^ ^^e^tjth the process ^threa ds. When a process mread 

nying ra wings. ^ running, the number of lines of cache memory storing 

BRIEF DESCRIPTION OF THE DRAWINGS information for that process thread is placed into a register. 

When information for that thread is written to additional 

FIG. 1 is a schematic block diagram of a multi-processor lines of the cache memory the number in that register is 

computer system; 35 incremented, and when information for another process 

FIG. 2 is a conceptual diagram of the information stored thread overwrites lines of cache (i.e., uses them for that other 

in main memory and cache memory of the computer system process thread) the register is decremented. The absolute 

of FIG. 1; number of cache memory lines used to store information for 

FIG. 3 b a conceptual diagram of the organization of a set each of tbe P roces * mreads ma y be ^ «f r ?S isler u s ' each 

associative cache memory; 4 ° re *» ter associated with a process thread. The abso lute 

__ . . , .. ,, . .. c . number of cache lines of information for each process thre ad 

FIG. 4 is a schematic block diagram of a cache memory ,ZZrW ^^ tnminc2kh P roce "s71h7el^ha7 the 

according to the present invention; g re'£t^T ^s7nce" in the cache memory . The scheduler 

FIG. 5 is a schematic block diagram of the present algorithm can use this set of numbers from the respective 

invention; and 45 scoreboard registers in determininig an optimal order of 

FIG. 6 is a schematic block diagram of a multi-processor execution of the program threads. However, the number of 

embodiment of the present invention. cache memory lines used to store information for each 

process thread may not always be indicative of the com- 

DETAILED DESCRIPTION OF THE pleteness of information stored in the cache memory for a 

PREFERRED EMBODIMENTS 5Q particular pr0 cess thread. One process thread may use only 

T he present invention is a method and system for ke eping a small amount of information and therefor will require 

trarf "f f b" number o f cache memory lines used for e ach relatively few cache memory lines. Another process thread 

a ctive process thread in a computer system and the n deter- may need a large number of cache memory lines, thus 

mi ning how many of those cache lines have retained the requiring a great deal more of the cache memory. If only the 

informa tion for each of the process threads after the resp ec- 55 absolute number of cache lines associated with each of these 

ti ve process thread has quit running . The percent of the two threads was used, the thread having the largest number 

number of cache memory lines remaining that were used by in its respective register would always be favored by the 

each process thread may be utilized by the operating sys- scheduler algorithm in determining when to next run that 

tern J s scheduler algorithm in determining the order of execu- thread. In many applications this suffices and would work 

tion of multiple program threads and the association of 60 well. 

computer processors therewith. Also, the change in cache Typically, the various levels of cache memory are of 

temperature for other processes when a particular process is different sizes. Everything that is in a higher level of cache 

run is of value. If for example you have a process that evicts is also in the larger, lower levels of the cache. It is noted that 

a lot of information from the cache when it runs, you may various algorithms could be developed that used this infor- 

not want to run it until other processes have finished. Or you 65 mation in different ways. For example, one process could 

may want to run it for a longer period of time only have no lines in the LI cache but everything it needs in the 

occasionally. In this case, it would get the same amount of L2. Another process could have some data in the LI cache 
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but little of anything else it needs in the L2 or L3 caches. The disk 134, telephone modem 136, keyboard 140 and a cursor 

choice of which one is more optimum to execute might pointing device (mouse) 142. The computer system 100 

depend on how it is desired for the system to respond. One illustrates multi -processors W2a-d and three levels of cache 

might be chosen for a system that needed to maximize total memory 104, 106 and 107 for each processor 102. The 

throughput while the other might be chosen if system s present invention is equally applicable to a single processor 

responsiveness is an issue. It should be noted that without or mu lti-processor computer system, each processor having 

the present invention no choices are possible based on cache one or more caches. The present invention contemplates and 

temperature. includes multiple level caches having have score board 

Preferably, the number of cache lines used for each registers associated with each cache level. The different 

process thread can be normalized by remembering what 10 levels of cache may have different temperatures on a per 

number of cache lines were used when a process thread was process basis. 

running and then see how much information was evicted Referring to FIG. 2, a conceptual diagram of the infor- 

from the cache lines when the process thread is ready to run ma(ion s(Qred in main m 108 and CKhe ' memoiy 106 

again. Jba may be accomplished by storing the number of of me uter tem 100 is illustrated. The information 

lines of cache used by the running process, then decrement- 15 2(E u stored jn tfae main me 1Q8 fa b form (d ht 

ing this number whenever a cache lme storing the process bi(s of hfoanrion) at physical addresses 204. A line 206 of 

information is over written. A normalized or percentage ^ cache me 106 ^ loaded ^ information 2 02 

value for each of the process threads may be determined by feM , 0 a ^ ^ ^ 206 ma C0Qtain a 

dividing the number of lines of cache determined when me . ali of b (4> g> 16> 32> 64 etc) of information 20 2 

process was last running by the presently remaining number M &om ^ main m6 m ^ cache block 1Q6 of pjQ 2 

of bnes of cache and storing this normalized or percentage iUustrates four b tes stored in ^ line 206> different ^ m 

value for each process. This percentage value of information multipks of thc four bytes per line are illustrated as lines 

which is retained in the cache for each of the process threads 20fifl 2Q6b ^ 206 „ an(J correspond t0 the respective 

hereinafter will be referred as their "temperatures." Usmg mformation bytes 2 02 of the main memory 108. 

the percentage value of information also takes into account 9 « . , . , -„„ ,. . , ,, 

that different process threads will use different amounts of „ A bl °f 20 , 8 s, ° res * ™ p ° ndl ° g ^' ca 

cache memory, and by using the percentage value of cache 210 ° f m the lme l 206 1 and a MESI ^212 

lines remaining a normalized representation of the tempera- f° r the lme f 6 ,. Each process thread is associated with at 

tare of each thread is achieved. h * st °ne cache lme206. The physical address 210 stored in 

^ l . . r , the tag block 208 has its least significant bits truncated as 

The greater the remaining percentage of cache memory 30 • t c .u u c u * * a ■ *u v e 

& \. .11 . .j appropriate for the number of bytes stored in the line 206, tor 

lines used by a process thread, the warmer the process is said rr , .„ , , . c . / t . . v 

„ . 3 y , . „ „ r .1- . • « 4 example as illustrated, four bytes are stored m the cache line 

to be. By keeping track or score of the retained percentage f, . , 7 . . :~ . ... , . . , 

f , %. v - & . , , . . . 11 j « 1 1 206, thus, the two least significant bits would be truncated 

of cache hnes for each thread m registers called scoreboard fr om ^ ^ ^ rf ^ ^ m 10g 

registers, a scheduler algonthm of the computer operatmg Sq j m th / cache mutt6aa wd proccssor find thc 

system may objectively determine the most advantageous 35 . ~ . c . . v / u u *\ u 

J , , t.-i f 1 j required information in the cache lines 206 (a cache hit) all 

process thread to next run and which processor should 4l l . . , . ... c u c t . / 

r . iL j t jj-.- *u • ■ * that is required is the upper address bits. Each 01 the lines 

execute mis next process thread. In addition, the scheduler . " . . , . \ r . r ,. p , , 4 , 

, , r . . * j 206 contain data and instruction information for each of the 

car , learn and adapt to interaction between various threads tem 1M _ 

Different threads are going to cause the evicUon of afferent ^ ^ 10fi j,^^ fc ^ ^ > e ^ cache 

lines of mformation stored in the cache memory. Running 40 r 

threads in different orders will cause more or less evictions Referring to FIG. 3, a conceptual diagram of the organi- 

to occur. This is especially true in a multi-processor system. zation of a ^ associative cache memory is illustrated The 

For example, running a particular thread may cause every cache memory set illustrated in FIG. 2 and discussed above 

cache line storing information for other threads to be evicted ma y be ^ated into more sets of cache memory. Cache 

(i.e., information for the running particular thread fills the 45 memor >' sets 306fl - rf each have associated ta 5 blocks 

entire cache memory). This will result in a hot temperature 308 ^- ^ cache memory, generally illustrated by the 

for this particular thread, but all of the other threads would numeral 300, is a set associative organiEation (though any 

have cold temperatures. Better cache temperatures for the tv P e of cache organization will work with the present 

other threads will be maintained if this cache memory invention and is contemplated herein). The cache memory 

intensive thread is isolated to its own processor/cache 50 300 is int0 four ^ thou S h ^° 01 more sets wiU 

memory. That would minimize its impact on the cache work and are contemplated in the present invention, 

temperatures for the other threads. The sets of the cache memory 300 are denoted by refer- 

Referring now to the drawings, the details of preferred ence numeral 306. Lines 310 contained within each set 306 

embodiments of the present invention are schematically an d the physical addresses of the contents thereof are 

illustrated. Like elements in the drawings will be repre- 55 specified by the associated tags 312. Information bytes from 

sented by like numbers, and similar elements will be rep- me main memory 108 are spread out over the four sets 

resented by like numbers with a different lower case letter (ways) of the cache and are not as susceptible to being 

suflBx. Referring to FIG. 1, a schematic block diagram of a written over when a cache replacement becomes necessary, 

multi-processor computer system is illustrated. The multi- As described above for the cache memory of FIG. 2, the 

processor computer system is generally indicated by the 60 addresses in the tag lines 312 represent the physical 

numeral 100 and comprises processors 102<7-<*, first level addresses in main memory 108 that the information bytes 

caches 104a-d, second level caches 106a-d, third level now stored in the cache Hnes 310 are from, 

caches 107a-</, main memory 108, PCI/memory bridges and The main memory 108 (FIG. 2) comprises a plurality of 

cache controllers 110a-d, SCSI adapter 112, LAN adapter memory locations having physical addresses 204. The infor- 

114, ISA adapter 116, user interface 118, video adapter 120, 65 mation required for all active process threads must pass 

video monitor 122, PCI bus 124, hard disk 126, tape drive through the cache memory lines 310. The processor 102 

128, CD ROM drive 130, local area network 132, floppy utilizes the cache memory lines 310 to rapidly access data 
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and instruction information for the execution of the process The registers 504 store the number of cache lines 408 

thread. Preferably, whenever the processor 102 issues a having information that are owned by each of the process 

request for data and instruction information, a cache threads active in the computer system. The contents S06a-n 

memory controller (not illustrated) checks to see if the of each of the registers 504a-«, respectively, may be read by 

requested information is already in the cache lines 310. If so, 5 the scheduler algorithm software and preferably may also be 

the processor 102 rapidly processes the information from the written to by the scheduler. Each time a cache line 408 is 

fast access cache memory 300. When the requested infor- written with information for a process thread, the corre- 

mation is not in the cache lines 310, there is a "cache miss." sponding physical address of the information is stored in tag 

The overall execution time for the processor 102 is slower line 410 and the number of the process thread that owns the 

when a cache miss occurs because the processor must access information is also stored in tag 414. The scheduler algo- 

the slower main memory 108 to retrieve the necessary rithm software assigns a PID number for each process thread 

process thread information therefrom. that is active in the computer system. 

Referring now to FIG. 4, a schematic block diagram of a An embodiment of the present invention reads the con- 
cache memory according to the present invention is illus- tents 506 of the PID registers 504 and uses the information 
trated. A cache memory set 402 is connected to a processor t o determine the optimum order in which the process threads 
102, a cache controller 404 and an associated tag block 406. shtnild be executed. As an example, if a processor is avail- 
The cache set 402 functions in the same way as the cache able and a proce ss exists with much of its information 
sets 106 (FIG. 2) and 306 (FIG 3), and may be set dread loaded imo thal processor * s cache memory lhen it 
associative like cache 300 (FIG. 3) Each line 408 of the be a d choice {Q SCQedule it ncxt on ^ csson 
cache set 402 stores process thread in formation. The tag I r 111 .1 r . ™r-* 
block 406 stores physical addresses 410 and MESI status 20 An analogy may be made that the contents 506 of the PID 
412 for each line 408 of the cache 402, as described herein "P*f» ^ « e ,he temperatures of the respective process 
above threads. The larger the number of cache lines used by a 

The method and system of the present invention expands P rocess the holler its temperature. Conversely, the 

the information stored in the tag block 406 by adding storage fewer ihc ™mber of cache lines used by a process thread, the 

space for process thread identification 414 for each cache 25 colder its temperature. Another way of analogizing the 

line, and optionally, a process active status bit 416. Each process threads in context with the number of lines used in 

time information for a process thread is loaded into a line cache memory is to keep "score" of the number of cache 

408 of the cache memory 402, the tag block 406 is updated lines having information owned by a process thread. Thus, 

with the physical addresses of the information bytes from the PID registers 504 may also be referred to as the "score - 

the main memory 108 and process identification ("PID") 30 board registers" since the larger the "score" the better the 

indicating which process thread owns the information in the chance that the related process thread will win first place in 

corresponding cache line. According to the present being executed by the processor. Note that the amount of 

invention, so long as process thread information is stored in data that a thread uses varies depending on the nature of the 

a cache line, the corresponding tag line will store the thread. It may be useful to "normalize" these numbers. A 

physical addresses of the information, the PID, and, 35 thread may be 'hot' even though it has less information in the 

optionally, the active status of the process thread. cache than another thread that is "cold " 

Alternatively, it is contemplated that the cache controller Also contemplated in the present invention are multiple 

404 could scan the cache memory looking for cache lines levels of cache memory (see FIG. 1, caches 104, 106, 107). 

containing discontinued process thread information, and PID registers for each level of cache may be utilized in 

when finding such a line, set the respective MESI register 40 determining the temperature of each process for each cache 

212 to invalid (see FIG. 2). level. A preferred embodiment of the present invention 

An additional embodiment according to the present inven- utilizes PID registers 504 that normalize the number of 

tion is flagging a discontinued process thread so that the cache lines storing information for each of the process 

information remaining in the cache memory associated with threads. A disclosed above, the total number of lines of cache 

the discontinued process thread would be the first to be 45 memory used to store information for a running process 

replaced with new information of active process threads. An thread may be stored in either a hardware peak value register 

advantage in using this discontinued process thread flag or stored as a software value, while the PID register 504 

feature (status bit 416) is that more efficient use of the cache contains the actual number of cache memory lines contain - 

memory is realized since some of the ambiguous random- ing information for its respective process thread, 

ness of typical replacement schemes is thus circumvented. 50 The peak value register always stores the maximum value 

In the preferred embodiment of the invention circuitry in of cache lines from the PID register, i.e., when the PID 

each level of the cache controllers would be available to register increments to a number value greater than the 

cycle through the cache memory to search for and invalidate number value of the peak value register, the peak value 

cache lines belonging to terminated processes, either setting register updates to maintain the maximum number of cache 

the status bit 416 to invalid or the MESI register 212 to 55 lines storing information for a running process thread, 

invalid. During the time this process thread is not running, the 

Referring now to FIG. 5, a schematic block diagram of an number of cache memory lines storing information thereof 
embodiment of the present invention is illustrated. The may be reduced due to other running process threads over- 
cache memory set 402 may be a single or multiple set writiog some of the cache lines. The present invention can 
associate memory with the tag block 406 containing both 60 normalize the number value in each PID register 504 by 
physical addresses 410 and PIDs 414 for each of the process dividing it with the peak number value stored in the peak 
threads active in the computer system. A plurality of PID value register. This gives the normalized or percentage value 
registers 504 are used to keep track of the number of cache for each process thread which, as discussed above, may be 
lines 408 having information owned by each of the process more representative of the temperatures for the information 
threads that are active in the computer system. A "running 65 of the process threads remaining in the cache memory. 
PID" register 508 stores the PID number for the process The system and method illustrated in FIG. 5 and disclosed 
thread being executed by the processor 102. above may be applied to a multi -processor computer system 
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having multi-way set associative cache memories for each of 
the multi-processors. Referring now to FIG. 6, a schematic 
block diagram of a multi-processor embodiment of the 
present invention is illustrated. Two processors 602A and 
602B are illustrated for clarity, but any number of processors 
may be utilized with the present invention. Each of the 
processors 602A and 602B have a two way set associative 
cache memory 606A and 606B, respectively. It is contem- 
plated by the present invention to use any number of ways 
for set associative cache memories. In addition, any type of 
cache memory will benefit from an embodiment of the 
present invention when used in conjunction with a single or 
multiprocessor computer system. 

In the embodiment illustrated in FIG. 6, cache controllers 
604A and 604B include the PID scoreboard registers 608A 
and 608B, and running PID registers 610A and 610B, 
respectively. Operation of each of the scoreboard registers 
608A and 608B, and running PID registers 610Aand 610B 
are as described for registers 504 and 508 of FIG. 5. The set 
associative caches 606A and 606B operate as described for 
the cache memory 300 of FIG. 3, and the single processor 
embodiment of the present invention illustrated in FIG. 4 as 
described hereinabove. 

The scoreboard registers 608A and 608B may also be a 
separate application specific integrated circuit (ASIC), or 
integrated within the microprocessor glue logic. Information 
bits are added to the tag words to facilitate tracking the 
number of lines in the cache memory associated with a 
particular process thread. As an example, eight additional 
bits in a tag word allows tracking of 256 process threads 
with the method and system of the present invention. The 
scheduler algorithm assigns each active process thread a 
label having a unique binary value from Od (decimal — 
base 30 ) to 255d. This label is part of the tag address for as 
long as the associated process thread information remains in 
the cache memory. Additionally, the status bit, as discussed 
above, may be implemented to indicate whether the process 
is active or inactive, and may be used in determining 
whether the associated cache line should be written over 
with information for another process thread. This implemen- 
tation is replicated for each processor cache in a multipro- 
cessor computer system. This can be accomplished several 
ways and is contemplated herein for all purposes. A status bit 
indicating the state of the process (live or dead) could be 
added to the tag information for the cache line. This same 
goal could be accomplished by simply marking the cache 
line as "Invalid" if the cache controller conformed to the 
MESI convention. For example, when a cache line contain- 
ing information of an invalid process thread is found, either 
the MESI register 212 (FIG. 2) is set to invalid or the active 
status bit 416 (FIG. 4) is set to invalid by the cache 
controller. The cache controller 604 may do this in a 
"background" mode and thus not significantly impact the 
operation of running thread(s). 

The present invention, therefore, is well adapted to carry 
out the objects and attain the ends and advantages 
mentioned, as well as others inherent therein. While pres- 
ently preferred embodiments of the invention and various 
aspects thereto have been given for purposes of disclosure, 
numerous changes in the details of construction, intercon- 
nection and arrangement of parts will readily suggest them- 
selves to those skilled in the art and which are encompassed 
within the spirit of the invention and the scope of the 
appended claims. 

While the present invention has been depicted, described, 
and is defined by reference to particularly preferred embodi- 
ments of the invention, such references do not imply a 
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limitation on the invention, and no such limitation is to be 
inferred. The invention is capable of considerable 
modification, alternation, and equivalents in form and 
function, as will occur to those ordinarily skilled in the 
pertinent arts. The depicted and described preferred embodi- 
ments of the invention are exemplary only, and are not 
exhaustive of the scope of the invention. Consequently, the 
invention is intended to be limited only by the spirit and 
scope of the appended claims, giving full cognizance to 
equivalents in all respects. 
What is claimed is: 

1. A computer system having cached multi-thread 
processes, said system comprising: 

a processor; 

a cache memory connected to said processor, said cache 
memory having a plurality of memory lines; 

a main memory connected to said cache memory; 

a cache memory controller for transferring a predeter- 
mined number of bytes of information for active pro- 
cess threads from said main memory to said cache 
memory; 

a plurality of scoreboard registers, each of the active 
process threads assigned to corresponding ones of said 
plurality of scoreboard registers, wherein each of the 
corresponding ones of said plurality of scoreboard 
registers store a number value representing the number 
of said cache memory lines that contain information for 
each of the plurality of active process threads; 

logic circuits making available the number value in each 
of said plurality of scoreboard registers to a software 
operating system scheduler program, and 

logic circuits in said cache memory controller that iden- 
tify certain ones of said cache memory lines that 
contain information for discontinued process threads so 
that these certain ones may be reused before other ones 
of said cache memory lines containing information for 
active process threads are reused. 

2. The computer system of claim 1, wherein: 

said cache memory lines are identified by process thread 
identification numbers according to the process thread 
information stored therein; and 

said plurality of scoreboard registers being identified by 
the process thread identification numbers according to 
the active process threads assigned thereto, 

3. The computer system of claim 1, wherein said proces- 
sor is a plurality of processors. 

4. The computer system of claim 1, wherein said cache 
memory is a plurality of cache memories. 

5. The computer system of claim 4, wherein said cache 
memory controller is a plurality of cache memories 
controllers, each one of said plurality of cache memory 
controllers controlling a respective one of said plurality of 
cache memories. 

6. The computer system of claim 1, wherein the certain 
ones of said cache memory lines are identified with a status 
bit in an associated tag block. 

7. The computer system of claim 1, wherein said cache 
memory controller marks the certain ones of said cache 
memory lines as invalid. 

8. The computer system of claim 2, further comprising a 
register containing the process thread identification number 
of the active process being executed by said processor. 

9. The computer system of claim 1, wherein the number 
value in each of said plurality of scoreboard registers 
increments when information is stored in corresponding 
lines of said cache memory. 
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10. The computer system of claim 1, wherein the number 
value in each of said plurality of scoreboard registers 
decrements when information is deleted in corresponding 
lines of said cache memory. 

11. The computer system of claim 1, wherein the number 5 
of predetermined bytes of information transferred to said 
cache memory lines is selected from the group of numbers 
consisting of 2" where n is an integer number from 1 to 16. 

12. The computer system of claim 1, further comprising: 
logic circuits for storing maximum number values repre- 10 

senting the greatest number of said cache memory lines 
that contain information for each of the plurality of 
active process threads; and 
logic circuits for dividing the number values in each of 
said plurality of scoreboard registers by the maximum j5 
number values so as to normalize the number values 
into a percentage from 0 to 100 percent. 

13. A multi-processor computer system having cached 
multi-thread processes, said system comprising: 

a plurality of processors; 

a plurality of cache memories, at least one of each of said 20 
plurality of cache memories associated with each of 
said plurality of processors, and each of said plurality 
of cache memories having a plurality of memory lines; 

a main memory connected to said plurality of cache 
memories; 25 

a plurality of cache memory controllers, one for each of 
said plurality of processors, said plurality of cache 
memory controllers transferring a predetermined num- 
ber of bytes of information for active process threads 
from said main memory to said plurality of cache 30 
memories; 
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a plurality of scoreboard registers, each of the active 
process threads assigned to corresponding ones of said 
plurality of scoreboard registers, wherein each of the 
corresponding ones of said plurality of scoreboard 
registers store a number value representing the number 
of the memory lines for each of said plurality of cache 
memories that contain information for each of the 
active process threads, wherein the number value 
stored in each of said plurality of scoreboard registers 
is available to a software operating system scheduler 
program, and 

logic circuits in each of said plurality of cache memory 
controllers that identify certain ones of said cache 
memory lines that contain information for discontinued 
process threads so that these certain ones may be reused 
before other ones of said cache memory lines contain- 
ing information for active process threads are reused. 
14. The computer system of claim 13, wherein: 

the memory lines of said plurality of cache memories are 
identified by process thread identification numbers 
according to the process thread information stored 
therein; and 

said plurality of scoreboard registers being identified by 
the process thread identification numbers according to 
the active process threads assigned thereto. 



04/22/2004, EAST Version: 1.4.1 



